Analysing Temporally Annotated Corpora with CAVaT 
Leon Derczynski, Robert Gaizauskas 



(N 

o 

(N 



(N 
(N 

u 

o 



in 

o 
in 

en 

o 

(N 



X 



University of Sheffield 

211Portobello,S14DP,UK 

L.Derczynski@dcs.shef.ac.uk, R.Gaizauskas@dcs.shef.ac.uk 

Abstract 

We present CAVaT, a tool that performs Corpus Analysis and Validation for TimeML. CAVaT is an open source, modular checking utility 
for statistical analysis of features specific to temporally-annotated natural language corpora. It provides reporting, highlights salient links 
between a variety of general and time-specific linguistic features, and also validates a temporal annotation to ensure that it is logically 
consistent and sufficiently annotated. Uniquely, CAVaT provides analysis specific to TimeML-annotated temporal information. TimeML 
is a standard for annotating temporal information in natural language text. In this paper, we present the reporting part of CAVaT, and 
then its error-checking ability, including the workings of several novel TimeML document verification methods. This is followed by 
the execution of some example tasks using the tool to show relations between times, events, signals and links. We also demonstrate 
inconsistencies in a TimeML corpus (TimeBank) that have been detected with CAVaT. 



1. Introduction 

In essence, TimeML mandates the mark up of expres- 
sions referring to times, expressions denoting events and ex- 
pressions signalling temporal relations between times and 
events or events and events; it also allows links to be added 
between entities, which are labelled with the temporal rela- 
tion holding between them. 

Existing TimeML tools can be divided into two categories: 
those which produce or alter mark-up, for example to as- 
sist annotation, and those that perform analysis. Only a 
few tools have as yet been developed for TimeML, mostly 
focusing on the annotation task, such as TTK (Verhagen 
and Pustejovsky, 2008), which does not support analysis. 
From the second category, in the absence of other software, 
the TimeML-using community is restricted to generic XML 
analysis tools, such as Xaira (Burnard and Dodd, 2003) or 
LT-XML 1 , as well as similar format-specific tools (TEI). 
These generic corpus tools are powerful applications, but 
require substantial effort to apply to TimeML data. 
We have constructed CAVaT (Corpus Analysis and Valida- 
tion for TimeML) to process collections of temporally an- 
notated documents. CAVaT's functionality is divided into 
two main parts; an integrated browsing and report genera- 
tion system, and a modular extensible error checking and 
corpus validation framework. 

In this paper, we first describe the technical aspects of the 
tool. We then present the reporting part of CAVaT, and 
then its error-checking ability, followed by the execution of 
some example tasks using the tool. We present an overview 
of the tool's operation and capabilities in Section 2. This 
includes details of the corpus loading and folding process 
(Section 2.1), report generation, and also a detailed expla- 
nation of the advanced validation modules that are included 
with CAVaT (Section 2.3). A brief syntax summary is pre- 
sented in Section 3; the full guide is on the CAVaT web- 
site 2 . Next, in Section 4, we present sample queries and 
output. In Section 5, we show inconsistencies and observa- 
tions in a TimeML corpus (TimeBank) that have been de- 



tected with CAVaT. Finally, Section 6 summarises the tool 
and discusses future work. 

2. Overview of the tool 

CAVaT is an open source tool, constructed from a set 
of Python modules and a database. It uses NLTK 3 and 
MySQL 4 . The interface is a text-based interactive prompt, 
and all operations are performed with text commands. 
Command syntax strives to be simple, flexible and close 
to natural language. After loading and pre-processing a 
TimeML corpus, one can analyse it using built-in reporting 
functions, and perform data validation with one of many 
checking components. 

2.1. Preprocessing 

CAVaT can work on any TimeML-annotated corpus that is 
stored as a collection of uncompressed files in a single di- 
rectory, by importing it to a set of database tables. The cor- 
pus is initially processed by an XML parser (using Python's 
minidom and expat implementations), which retrieves 
document level data as well as all temporally annotated in- 
formation, and places it into a MySQL database. Tempo- 
rally annotated data includes all TimeML tags and their at- 
tributes, as well any enclosed tokens for EVENT, SIGNAL 
and TIMEX3 tags. 

In TimeML, events are represented with the EVENT tag, 
and temporal expressions with the TIMEX3 tag. These 
intervals are the elements which CAVaT and the rest of 
this paper assume as temporal primitives, unless otherwise 
stated. Temporal relations between intervals are described 
with the TLINK tag, and temporal signals with SIGNAL. 
See Figure 1 for an example. 

Automatically classifying the type of temporal relation be- 
tween intervals is currently a difficult problem in tempo- 
ral processing of text (Mani et al., 2006; Lapata and Las- 
carides, 2006; Hepple et al., 2007). The task is often made 
simpler by reducing the number of temporal link classes. 
TimeML includes BEFORE and AFTER relations, though 



1 From http://www.ltg.ed.ac.uk/software/ltxml 
"Available at http://code.google.eom/p/cavat/ 



See http://www.nltk.org/ 
4 See http://www.mysql.com/ 



Figure 1 : Example text and TimeML annotation. 



Text: "On Thursday, he took the plane to Copenhagen" 
TimeML: 

<SIGNAL sid="sl">On</SIGNAL> 

<TIMEX3 tid="tl" type="DURATION">Thursday</TIMEX3> 
he <EVENT eid="el" class = " I_ACTION">took</EVENT> 
the plane to Copenhagen. 

<MAKEINSTANCE eiid="eil" eventID="el" pos-"VERB" 

tense="PAST" polarity="pos" /> 
<TLINK eventInstanceID="el" relatedToTirae="t 1 " 

signalID="sl" relType= n DURING "> 



Table 1 : Mappings between TimeML relations that can be applied 
in order to reduce the size of the relation set; when applying the 
transformation in the table, TLINK argument order is swapped. 



Original relation type 


Folds to relation 


AFTER 


BEFORE 


ISJNCLUDED 


INCLUDES 


IAFTER 


IBEFORE 


BEGUN_BY 


BEGINS 


ENDED.BY 


ENDS 


DURING.INV 


SIMULTANEOUS 


DURING 


SIMULTANEOUS 


SIMULTANEOUS 


SIMULTANEOUS 



one may simply reverse the arguments of a BEFORE rela- 
tion to turn it into an AFTER one — so, June 2008 was 
before August 2009 is equivalent to August 2009 was after 
June 2008. It is thus possible to convert all links of one 
of these types to the other. We call this technique folding. 
Given a set of mappings, the 13 TimeML relations can be 
reduced. 
CAVaT offers three folds: 

• CAVaT fold - Collapses all inverse relations, such as 
mapping INCLUDED_BY to INCLUDES (see Table 1). 

• SputLink fold - The mapping introduced by Marc 
Verhagen, included in TTK (Verhagen, 2005). 

• Compact fold - Reduces TimeML' s link relation set 
to 3 classes, using mappings defined in Setzer et al. 
(2005). 

The first two are lossless, in that no temporal information 
is removed by the folding process. The third is lossy. It is 
possible to perform a lossy fold by, for example, reducing 
the TimeML BEGUN_BY relationship to one of INCLUDES. 

2.2. Querying 

The reporting part of CAVaT makes analysis of TimeML 
corpora simpler and easier than working directly with a set 
of XML documents, allowing flexible queries, and catering 
for inquiries specific to temporally-annotated data. 
The development of CAVaT has been driven by investi- 
gations of TimeML corpora. Many of the operations per- 
formed against a corpora had common elements, often cen- 
tred around the retrieval of class distributions or token fre- 
quencies. A tool for TimeML corpus research could en- 
compass all the required operations, while providing access 
to a larger range of reports. 



CAVaT uses a report generation system where one can view 
any number of pre-defined features that match conditions of 
the user's choosing. Queries can produce reports at vary- 
ing levels of granularity - one may choose to examine data 
at sentence, document or corpus level. Reports can output 
counts, distributions, lists or text extracts. Example queries 
are listed in Section 4. Data such as part-of-speech infor- 
mation, tense, aspect, and event recurrence are captured by 
attributes described by TimeML, and any data like this (an- 
notated by tags and their attributes) can be queried. In ad- 
dition, properties specific to temporal data but not directly 
present in mark-up are implemented, including: 

• Event / event instance abstraction In some cases, 
one piece of text may refer to two separate events 
(an example is given later in Section 5.1.). To per- 
mit annotation of this, TimeML's EVENT tag is placed 
around the text, and then event instances are specified 
using one or more MAKEINSTANCE tags. Data re- 
lating to a piece of event text, such as part of speech, 
polarity and modality, are described in the MAKEIN- 
STANCE tag. However, we would often like to see 
the part of speech data for an event; indeed, when 
discussing temporal entities, the term "event" is often 
used in place of "event instance". Thus, CAVaT im- 
plicitly translates between these two related tags when 
requested; for example, when one asks to see event 
modality or cardinality. 

• Signalled links TLINKs may indicate a textual sig- 
nal that suggests the type of relationship between their 
arguments. For example, in Lydia ate dinner before 
leaving the house, the word before acts as a signal, or- 
dering two events. As signals are explicit indicators 
of temporal association, and correctly typing a tempo- 
ral link is difficult, it is useful to be able to quickly 
identify which links employ a signal. 

• Signal text and TLINKs As SIGNAL text referenced 
from a TLINK may be thought of as that TLINK's sig- 
nal text, CAVaT permits queries that specify signal text 
as an attribute of a TLINK. 

• Text position and lemma Although not part of the 
TimeML annotation schema, CAVaT logs text position 
(by sentence number and word number), and main- 
tains lemmas of text found within tags. 

One may view a particular TLINK's location in the origi- 
nal document, showing the link's arguments and their re- 
lation type. This helps understand the context of a single 
TLINK. For example, one may often see many links to a 
single document date, or discover that most links have ar- 
guments within the same paragraph - something not im- 
mediately obvious to humans while browsing the TLINK 
markup, and unclear with generic corpus tools. 

2.3. Checking 

Temporal annotation is a complex task, and as a result, a 
relatively small amount of text has been annotated to date. 
The largest TimeML corpus is TimeBank (Pustejovsky et 
al., 2003), with less than 200 documents, and around 65000 



Figure 2: With time flowing from left to right, this represents A 
BEFORE B and B INCLUDES C. It is not possible for C and A to 
be the same interval. 



Table 2: Mapping from TimeML relation types to a simple point- 
based temporal algebra. The TimeML relation is of the form a 
RELATION b. Where multiple relations are given, all hold. Similar 
to the table listed in (Verhagen, 2005). 



A 



B 



tokens. Because of the complexity of temporal annotation, 
errors can arise beyond those that may be detected using 
an XML DTD. CAVaT is both a reporting and validation 
tool, and seeks to automatically detect high-level and com- 
plex errors that are rarely immediately obvious. Part of the 
motivation behind this part of the tool is similar to that of 
writing unit tests that highlight bugs in an application: to 
improve quality by automatically detecting previously seen 
errors. In this section we detail some checks that CAVaT 
can perform on a TimeML corpus. 

Error checks are defined as Python modules, so that one 
may describe a detection method for an error case and share 
it with other researchers without modifying CAVaT's core 
code. The modules inherit from the cavatModule class; 
documentation is in the source code, and one may view a 
list of available modules with the command check list. 

2.3.1. Inconsistent closure 

It is possible to create an inconsistent configuration of tem- 
poral links. For example, we may have A BEFORE B and 
B INCLUDES A; this is clearly not possible, as INCLUDES 
stipulates that the start point of A occur after the start point 
of B (see Figure 2). While this example is fairly clear, it 
may not be at all clear to human annotators that a partial 
temporal link annotation could imply an inconsistent con- 
figuration. 

Automatically checking the consistency of a temporal net- 
work is hard. TimeML's relations are based on those of 
Allen (1983), and it is difficult to guarantee the consistency 
of networks formed using the latter set of relations (Vi- 
lain et al., 1989; Tsang, 1987). We re-state the problem 
in a more simple fashion, as follows. Intervals are repre- 
sented by pairs of endpoints, and we define intervals and 
the TimeML relations between them in terms of relations 
between these points. Our model uses only simultaneous 
(=) and before (<) relations. 

The consistency checker works in a similar way to the clo- 
sure algorithm in Setzer et al. (2005). It maintains an 
agenda and database. Assertions are taken from the agenda 
and used to infer further assertions when combined with 
assertions in the database. We initially process intervals in 
the document (taken from TLINK arguments) - for each 
interval / we add I s tart < lend to the database. We then 
generate initial data for the agenda based on TLINKs in the 
document and a mapping for each TLINK to one or more 
assertions, listed in Table 2. 

The only inference rules needed with our minimal set of 
relations are: 

• If x — y then y = x 

• If x = y and y — z then x = z 

• If x < y and y < z then x < z 



TimeML relation type 


Relation added to agenda 


BEFORE 


&end "^ u start 


AFTER 


Vend ^ ®start 


IAFTER 


^end — ^start 


IBEFORE 


&end — u start 


INCLUDES 


Q start *^ Ostarti ®end ^ @*end 


ISJNCLUDED 


Ostart ^ Qstarti &end "^ ®end 


BEGINS 


Q*start — ^starti ^end ^ &end 


BEGUN_BY 


Q*start — Ostarti ®end ^ Q*end 


ENDS 


Q*end — ®end? Ustart ^ Ustart 


ENDED.BY 


®end — Q*endy Qstart ^ ^start 


SIMULTANEOUS 


Qstart — ^starti ®>end — ®end 


IDENTITY 


^start — ^starti ®end — Q*end 


DURING 


& start — v start? ^end — &end 


DURING.INV 


(Istart — vstartt ^end — Vend 



We can take items from the agenda. For each such item, 
we compare it against the database, and deduce new rela- 
tions using the above rules. If a newly generated relation 
conflicts with anything in the agenda or database, then the 
document is inconsistent. Otherwise, we will move the item 
from the agenda to the database, and add newly generated 
relations to the agenda. If we can clear the agenda, then 
the document is consistent; otherwise, it is not. Whether 
we add new relations to the top or bottom of the agenda 
(achieving depth- or breadth-first search, respectively) is ir- 
relevant to the success of the algorithm, though computa- 
tional performance differences have not been measured. 
Our baselines are the results of the tlink_loop test (Sec- 
tion 2.3.3.) and also the results of closure success according 
to SputLink (Verhagen, 2005). This algorithm detected all 
known inconsistencies in TimeBank, and found one more; 
full details are later in Section 5.2. A test TimeML corpus 
is included with CAVaT for verifying that the consistency 
checker works, which alternative implementations may use 
for validation. 
Below is sample output from a consistency check: 

cavat> check consistent in 3 

# Temporal graph consistency checker vl loaded 

# Checking wsj_0927.tml (id 3) 

! Inconsistent closure - could not assert 
(ei2415_2 < ei2414_l) 

2.3.2. Disconnected sub-graph detection 

After inferring a temporal closure (Verhagen, 2005) of a 
document, one is usually left with a single interconnected 
temporal graph, where nodes are TIMEX3s or EVENTs 
and edges represent TLINKs. However, disconnected 
groups of links may exist post-closure. This should be 
brought to the attention of the user; it often suggests that 
annotating a small number of additional links can greatly 
increase the amount of data inferable though closure, and 
that an annotation is incomplete. 

CAVaT's sub-graph identification module, split_graph, 
works by processing TLINKs from a document sequen- 
tially. We maintain a list of sets that will hold intercon- 



nected intervals, beginning with an empty list. For each 
TLINK, we check to see if either of its arguments (which 
are both intervals) can be found in any set in our list. If 
one argument can but the other cannot, the new interval is 
added to the same set as the found interval. If they are 
both found in the same set, no action is taken. If they are 
found in different sets, those two sets are merged. If neither 
TLINK argument can be found anywhere, a new set hold- 
ing both intervals is created. This process is repeated until 
all TLINKs have been processed, at which point each set in 
the list represents an independent sub-graph of connected 
intervals. 

The module will then report statistics about the graph(s) 
found in the specified document. These include: 

• Count of sub-graphs, intervals and TLINKs; 

• The number of "isolated" sub-graphs - that is, those 
described by only one temporal link - and the propor- 
tion of intervals/links used to describe all these iso- 
lated sub-graphs; 

• Mean and maximum sub-graph size, and the propor- 
tion of the document's intervals that are in the largest 
sub-graph; 

• The entropy of sub-graph sizes, which acts as a "frac- 
turedness" measure, showing how far the document 
is from having one single totally connected temporal 
graph including all TLINKs; 

• The distribution of sub-graph sizes. 

Even though sub-graphs are populated by processing the 
two intervals of a TLINK at the same time, it is possible 
to have a sub-graph containing just one node, in the case 
of a TLINK loop (Section 2.3.3.). Note that a document 
containing intervals but no temporal links between them is 
marked as "un-fractured", as this check ignores any items 
not referenced at least once by a temporal link. 
Here is sample output from an attempt to identify discon- 
nected sub-graphs: 

cavat> check split_graph in 3 

# Split graph detection vl loaded 

# Checking wsj_0927.tml (id 3) 

Subgraphs found: 13 - composed of 69 nodes and linked 

by 65 TLINKS. 
Isolated subgraphs, that contain just one TLINK: 5 

{making up 38.5% of all subgraphs / consuming 14.5% 

of all nodes / described by 7.7% of all TLINKs); 
Mean graph size 5.3 nodes; largest subgraph (size 35) 

has 50.7% of all nodes. 
Entropy of subgraph sizes: 0.448277644573 

2 nodes : ( 5 ) 

3 nodes : ( 4 ) .... 

4 nodes : (3) ... 
35 nodes : ( 1 ) 

2.3.3. Superfluous TLINKs 

Some TLINKs in TimeML corpora have been specified that 
associate an event with itself. For example: 

<TLINK lid="167" relType="IDENTITY" 

event Ins tanceID="eil2 41" relatedToEventInstance-"eil241" 

/> 

In this case, the only information conveyed is that eil241 
is identical to itself, making this a redundant TLINK. CA- 
VaT includes a check that will identify TLINKs where both 
arguments reference the same event instance or event. Al- 
though such TLINK loops might be detected by consis- 



tency checking, those which specify a SIMULTANEOUS 

or IDENTITY relation will not. 

Below is sample output, showing some superfluous 

TLINKs: 



cavat> check tlink_loop in 165 159 143 

# TLINK loop checker vl loaded 

# Checking ABC19980304. 1830. 1636. tml (id 165) 
TLINK ID 123 may be a loop (eventID match), type 

INCLUDES, event ei286 / ei288 - check document 
manually 

# Checking wsj_1013.tml (id 159) 

TLINK ID 1107 loops directly (instancelD match) , type 
IDENTITY, event ei2495 / ei2495 

# Checking wsj_0586.tml (id 143) 

TLINK ID 1192 loops directly (instancelD match), type 
BEFORE, event eil404 / eil404 



2.3.4. Orphaned object details 

There is not yet a definition for TimeML annotation com- 
pleteness, that states a minimal satisfactory level of annota- 
tion for a document. In the absence of such a definition, it 
is not a mistake to annotate entities without attaching them 
to anything else in the document. However, we believe that 
wherever possible, every interval should be connected to at 
least one other interval, and that the annotation of entities 
that do not contribute or relate to any other annotated in- 
formation is superfluous. For example, if one chooses to 
mark text as a temporal signal, a related link or event in- 
stance should reference the signal. In this example, if the 
signal conveys no temporal information, it should not be 
annotated. 

To this end, CAVaT includes a module that is aware of five 
cases which describe objects attached to nothing else, and 
reports such orphan objects. Firstly, any TIMEX3 that is 
not related by any link is deemed to be independent. Also, 
any event instance (from MAKEINSTANCE) that is not 
referenced by a link is also orphaned. Next, an EVENT 
that is never instantiated is unattached, as instantiation is 
required by current TimeML syntax before EVENTs can 
be linked to anything else. Instances that come from non- 
existent or mislabelled EVENTs are also orphans. Finally, 
SIGNALS that are not referenced by any link or event in- 
stance (as in our example above) are included in the list of 
orphaned objects. 
Here is the sample output from a check for orphans: 

cavat> check orphans in wsj_0927.tml 
# Orphaned tag detection vl loaded 
TIMEX3 tl04 not in any link 
TIMEX3 tl31 not in any link 

2.4. Limitations 

CAVaT is currently limited in the number of objects (based 
on TimeML tags) that it can store for a single corpus. Ob- 
jects are stored in MySQL tables, and these are limited by 
the operating system's maximum file size limit. The maxi- 
mum number of corpora that CAVaT can stored is restricted 
to the operating system limit of files in a single directory. 

3. Syntax 

Here we briefly introduce CAVaT's basic top-level com- 
mands, and some of their more useful features. A 
full specification of CAVaT's syntax is available at 

http : //code . google . com/p/cavat. 



3.1. Corpus manipulation 

Commands for manipulating TimeML corpora within CA- 
VaT begin with corpus. One may view a list of available 
corpora with corpus list, and use a name from the re- 
sulting list to select a corpus for querying or checking with 
the corpus use command. It is also possible to view 
any notes attached to the currently selected corpus by using 
corpus info. Before one can use a corpus, though, a 
directory of TimeML files must be imported into CAVaT, 
using corpus import. One may also opt to fold the 
corpus on import (see Section 2.1.); a note will be attached 
to the database if this has been done. 

3.2. Querying 

The show command generates reports from the current cor- 
pus. Reports focus on one tag type, and give information 
about its attributes. One can view all values for a tag with 
"list" reports, or the distribution of values with "distribu- 
tion" reports, or simply see how many instances of that tag 
use a particular field with "state" reports. 
The general format for report generation is: 

show <report type> of <tag> <field> [as <format>] 

From the above example, <t ag> corresponds to a TimeML 
tag, and is one of event, instance, timex3, signal, 
tlink, slink or alink. As well as the attributes avail- 
able for each tag, the following extra values for <f ield> 
are available: 

• For TLINKs, signal text refers to the text en- 
closed by the start and end tags of an associated signal; 

• For EVENTs, one may reference all the attributes of a 
MAKEINSTANCE tag too; 

• In TLINKs, SLINKs and ALINKs the arguments are 
referred to as argl and arg2, so that the CAVaT user 
does not have to worry about the implicit indication of 
interval type present in attribute names. 

Reports are available in multiple formats. These can be 
specified by adding as <f ormat> to the end of a show 
query. 

• screen - The default choice, screen gives text for- 
matted for display in a fixed-width font. 

• csv - Output as comma separated values. 

• tex - TeX table format, including caption. 

The TeX output of an example report, showing the dis- 
tribution of TLINK relTypes in TimeBank vl.2, can 
be generated with show distribution of tlink 
reltype as tex and is shown in Table 3. 
One may also specify a subset of a corpus to be used for 
reporting, using a simple where clause. For example, one 
may ask: 

cavat> show state of tlink signalid where reltype is 
after 

to see how many TLINKs of type AFTER use a signal; or, 
one may ask: 

cavat> show distribution of tlink reltype where signalid 
is not filled 

to find out which relTypes are most likely in TLINKs that 
do not specify a signal. As part of CAVaT's goal to be easy 



Table 3: Distribution of Tlink reltype 



Tlink reltype 


Frequency 


Proportion 


BEFORE 


1408 


21.9% 


ISJNCLUDED 


1357 


21.1% 


AFTER 


897 


14.0% 


IDENTITY 


743 


11.6% 


SIMULTANEOUS 


671 


10.5% 


INCLUDES 


582 


9.07% 


DURING 


302 


4.71% 


ENDED_BY 


177 


2.76% 


ENDS 


76 


1.18% 


BEGUN_BY 


70 


1.09% 


BEGINS 


61 


0.95% 


IAFTER 


39 


0.608% 


IBEFORE 


34 


0.53% 


DURINGJNV 


1 


0.0156% 


Total 


6418 





to use and close to natural language, there are multiple valid 
syntaxes for filled/unfilled attributes. 

3.3. Browsing 

The ability to examine annotated entities in a TimeML cor- 
pus is required as part of investigative research. To enable 
this, CAVaT includes the browse command. 
Browsing allows the user to select a document (with 
browse doc, followed by a document ID or filename), 
and then view any tag within that document. Associated 
data is also shown; for example, if one browses an EVENT 
tag, any related MAKEINSTANCE tags will also be listed. 
One may view the tag in three formats - screen, the de- 
fault; csv, as two rows of comma-separated values (the 
first with attribute names as column headings); or t imeml, 
giving valid TimeML for the requested object. The syntax 
for these is the same as that for show commands; simply 
append as <output type> to the browse command. 
The document selected for browsing is also used as the 
default document for checks, which are detailed in Sec- 
tion 2.3. 

4. Example tasks 

Below are some examples of using CAVaT to address real 
research problems. All are based on TimeBank vl.2. 

4.1. Show all temporal links that employ a signal 

As part of research toward better automatic TLINK anno- 
tation, we wanted to know what proportion of TLINKs in a 
corpus had been annotated as employing a signal. 

cavat> show state of tlink signalid 
Count State of Tlink signalid 

718 signalid filled (11.2%) 
5700 signalid unfilled (88.8%) 

The state keyword here treats signallD as having two 
states - filled or unfilled. The TLINK's signalid field 
will either be empty/absent or contain a reference to a sig- 
nal annotated in text; for this task, we do not care which 
specific signal is being referenced. 



Table 4: Distribution of Event part-of-speech 



Event pos 


Frequency 


Proportion 


VERB 

NOUN 

OTHER 

ADJECTIVE 

PREPOSITION 


5122 

2225 

299 

266 

28 


64.5% 
28.0% 
3.77% 
3.35% 
0.353% 


Total 


7940 





Table 5: Distribution of Tlink signal text when Reltype is "before" 



Signal text 


Frequency 


Proportion 


before 


24 


31.6% 


Previously 


10 


13.2% 


by 


7 


9.21% 


already 


6 


7.89% 


Earlier 


6 


7.89% 


until 


5 


6.58% 


then 


4 


5.26% 


followed by 


2 


2.63% 


prior to 


2 


2.63% 


Other signals, frequency 1 


10 


13.2% 


Total 


76 





4.2. Dealing with ambiguous "part of speech" values 

Many instances of events in TimeBank assert 
pos = "other". This is a problem when, e.g., using 
WordNet to lemmatise event strings. The distribution in 
Table 4 can be created with the command: 

cavat> show distribution of event pos 

After this, we would like to view event text that is classified 
as other, using the following query: 

cavat> show list of event text where pos is other 

#10.86 

#39.8 million 

#54 .8 million 

$1 

$1.05 

(truncated) 

The result suggests that there are at least some numeric val- 
ues for these event tokens, as well as the more typical verbs. 
This led to the substitution of all currency and numeric 
event strings with representative tokens, as a feature for a 
CRF classifier, yielding a performance increase in TLINK 
classification (in unpublished results). 

4.3. Which signals does the before relation use? 

Sometimes, particular relation types are strongly suggested 
by related signals. To determine the signal texts used with 
BEFORE TLINKs, one may query: 

cavat> show distribution of tlink signaltext where 
reltype is before 

From the results in Table 5, we can see that the token "be- 
fore" suggests a BEFORE relation, but that the majority 
of annotated BEFORE relations do not employ this signal 
(from Table 3, there are a total of 1408 such relations, only 
24 of which use the signal). This indicates that building a 
relation classifier that relies solely on such signals will not 
be useful. 



4.4. Superfluous TLINK checking 

One may want to find instances where a link has been made 
between an entity and itself. We have an error checking 
module for this, tlink_loop: 

cavat> check tlink_loop in WS J910225-0066 . tml 
TLINK ID 1383 matches, type IS_INCLUDED, event eil482 
TLINK ID 1376 matches, type AFTER, event eil454 
TLINK ID 1345 matches, type AFTER, event eil356 

One can explicitly query in all to search the entire cor- 
pus for similar mis-annotations. 

5. Validation of a sample corpus 

As we can now load and process any TimeML corpus, 
and have a set of advanced validation tests, it is logical 
to test existing TimeML annotated corpora and examine 
them. In this section, we present the results of running CA- 
VaT's check modules on TimeBank vl.2. This corpus is 
not new and has been amended and improved by the com- 
munity (Boguraev et al., 2007), so may contain many fewer 
errors than freshly annotated documents. 

5.1. Checking for loops 

We used the tlink_loop module (Section 2.3.3.) on the 
corpus. This identifies TLINKs where both arguments are 
the same event or event instance. 

Of TimeBank's 183 documents, 19 have at least one TLINK 
containing such a loop, and there are 26 loops in total. Of 
these loops, 10 are on TLINKs of type SIMULTANEOUS or 
IDENTITY. Such TLINKs will not make a graph inconsis- 
tent, but are certainly redundant. The remaining 16 loops 
of other types will cause an inconsistency. All but one of 
the loops found are temporal links where both arguments 
reference the same event instance; only one references two 
different instances of the same event (TLINK L23, in doc- 
ument ABC19980304.1830.1636.tml). The TimeML in 
question is as follows: 

But they still have <EVENT eid="e28" 

class="I_ACTION">catching</EVENT> up to do two hundred 

and thirty four Americans have <EVENT eid-"e30" 

class="OCCURRENCE">flown</EVENT> in space, only twenty 

six of them women. 

<MAKEINSTANCE eventID="e30" eiid="ei286" tense="PRESENT" 

aspect="PERFECTIVE" polarity="POS" cardinality=" 234 " 

pos="VERB"/> 

<MAKEINSTANCE eventID="e30" eiid="ei288" tense="PRESENT" 

aspect="PERFECTIVE" polarity="POS" cardinality="2 6" 

pos="VERB"/> 

<TLINK lid="123" relType="INCLUDES" 

eventInstanceID="ei28 6" relatedToEvent Instance="ei288" /> 

In this case, the annotation suggests that during the flying in 
space of 234 Americans, 26 women flew, which is a correct 
interpretation of the text. CAVaT recommends the manual 
examination of event ID loops upon their detection. All 
the other tags reported by this check indicate redundant or 
incorrect annotations. 

5.2. Checking for consistent graphs 

Since the consistency checker uses a novel method (see 
Section 2.3.1), we verified its output by comparing it with 
that of SputLink and CAVaT' s loop detection, and finding 
explanations for every inconsistency. A small test corpus 
of TimeML documents is also included with CAVaT for as- 
suring the accuracy of this tool. 



SputLink would not report an inconsistency with a TLINK 
loop that was not of type SIMULTANEOUS or IDENTITY; 
many of the inconsistent documents were found faulty by 
both SputLink and CAVaT. Some documents had an erro- 
neous initial TLINK configuration; most faults were subtler 
than this, and their discovery required a closure attempt. 

5.3. Checking for split graphs 

The split_graph module checks for documents whose 
temporal graphs contain sets disconnected TLINKs. No 
single document in TimeBank has a fully-connected tem- 
poral graph, with a path traceable between every inter- 
val. The "best-connected" document (least fractured) is 
ws j_0144 . tml, which has 34 intervals split into only two 
subgraphs; one containing 32 intervals, the other two. 
The most fractured document is ws j _1 3 3 . tml, which is 
split into 12 sub-graphs having a mean graph size of only 
2.7 intervals (a single TLINK connected to no other cre- 
ates a graph of size 2). Despite having 32 intervals in total 
to connect, the largest sub-graphs in this document include 
only 4 intervals. 

5.4. Replication 

The results above can be simply replicated by downloading 
CAVaT vl, gathering a copy of TimeBank vl.2 5 , import- 
ing the data/timeml/ subdirectory of TimeBank, and 
running check test in all in CAVaT, where test 
is the name of the desired test module. 

6. Conclusion and future work 

We have described CAVaT, a language-independent tool 
which adds a layer of abstraction between TimeML markup 
and human researchers, making data easier to analyse, and 
patterns easier to spot. It also helps identify trouble spots 
in annotations. 

TimeML corpora are only available at this time in Roma- 
nian (Forascu et al., 2007) and English; this makes multi- 
lingual testing of the tool difficult. However, the markup 
is not language-specific, and results are likely to be equally 
useful across many languages; this may be shown using test 
corpora released for TempEval 2 6 , which will include En- 
glish, Italian, Spanish, Chinese, Korean and French. 

Future work CAVaT may be able to provide repair sug- 
gestions. These may include fixes for inconsistent graphs, 
as well as suggestions for missing fields based on lexical re- 
sources, third-party tools or heuristics. The modular error 
checks allow creation of an open database of TimeML val- 
idations, to help improve the integrity of all TimeML cor- 
pora. Check modules that match the output of rule-based 
high confidence tools such as S2T (Verhagen and Puste- 
jovsky, 2008) can be added. 

The consistency checker is "a TimeML closure engine that 
uses the precise relations behind the scenes" (Verhagen, 
2005). Therefore, it may be used to empirically discover 
how often incorrect links are introduced in closure, when 
compared with the existing leading closure tool, SputLink. 

5 LDC catalogue number LDC2006T08 
6 http ://w w w. timeml . org/tempeval2/ 
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